Search CORE

2,487 research outputs found

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

Author: Ai Yang
Ling Zhen-Hua
Lu Ye-Xin
Publication venue
Publication date: 25/04/2023
Field of study

This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework. The SF-GAN vocoder is composed of a source module and a resolution-wise conditional filter module and is trained based on generative adversarial strategies. The source module produces an excitation signal from the F0 information, then the resolution-wise convolutional filter module combines the excitation signal with processed acoustic features at various temporal resolutions and finally reconstructs the raw waveform. The experimental results show that our proposed SF-GAN vocoder outperforms the state-of-the-art HiFi-GAN and Fre-GAN in both analysis-synthesis (AS) and text-to-speech (TTS) tasks, and the synthesized speech quality of SF-GAN is comparable to the ground-truth audio.Comment: Accepted by NCMMSC 202

arXiv.org e-Print Archive

Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation

Author: Ai Yang
Ling Zhen-Hua
Lu Ye-Xin
Publication venue
Publication date: 17/08/2023
Field of study

Speech phase prediction, which is a significant research focus in the field of signal processing, aims to recover speech phase spectra from amplitude-related features. However, existing speech phase prediction methods are constrained to recovering phase spectra with short frame shifts, which are considerably smaller than the theoretical upper bound required for exact waveform reconstruction of short-time Fourier transform (STFT). To tackle this issue, we present a novel long-frame-shift neural speech phase prediction (LFS-NSPP) method which enables precise prediction of long-frame-shift phase spectra from long-frame-shift log amplitude spectra. The proposed method consists of three stages: interpolation, prediction and decimation. The short-frame-shift log amplitude spectra are first constructed from long-frame-shift ones through frequency-by-frequency interpolation to enhance the spectral continuity, and then employed to predict short-frame-shift phase spectra using an NSPP model, thereby compensating for interpolation errors. Ultimately, the long-frame-shift phase spectra are obtained from short-frame-shift ones through frame-by-frame decimation. Experimental results show that the proposed LFS-NSPP method can yield superior quality in predicting long-frame-shift phase spectra than the original NSPP model and other signal-processing-based phase estimation algorithms.Comment: Published at IEEE Signal Processing Letter

arXiv.org e-Print Archive

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Author: Ai Yang
Ling Zhen-Hua
Lu Ye-Xin
Publication venue
Publication date: 17/08/2023
Field of study

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network which explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by time-frequency Transformers along both time and frequency dimensions. The encoder aims to encode time-frequency representations derived from the input distorted magnitude and phase spectra. The decoder comprises dual-stream magnitude and phase decoders, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude estimation architecture and a phase parallel estimation architecture, respectively. To train the MP-SENet model effectively, we define multi-level loss functions, including mean square error and perceptual metric loss of magnitude spectra, anti-wrapping loss of phase spectra, as well as mean square error and consistency loss of short-time complex spectra. Experimental results demonstrate that our proposed MP-SENet excels in high-quality speech enhancement across multiple tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it successfully avoids the bidirectional compensation effect between the magnitude and phase, leading to a better harmonic restoration. Notably, for the speech denoising task, the MP-SENet yields a state-of-the-art performance with a PESQ of 3.60 on the public VoiceBank+DEMAND dataset.Comment: Submmited to IEEE Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Learning Probabilistic Coordinate Fields for Robust Correspondences

Author: Cao Zhiguo
Li Xin
Lu Hao
Ye Xinyi
Zhao Weiyue
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/06/2023
Field of study

We introduce Probabilistic Coordinate Fields (PCFs), a novel geometric-invariant coordinate representation for image correspondence problems. In contrast to standard Cartesian coordinates, PCFs encode coordinates in correspondence-specific barycentric coordinate systems (BCS) with affine invariance. To know \textit{when and where to trust} the encoded coordinates, we implement PCFs in a probabilistic network termed PCF-Net, which parameterizes the distribution of coordinate fields as Gaussian mixture models. By jointly optimizing coordinate fields and their confidence conditioned on dense flows, PCF-Net can work with various feature descriptors when quantifying the reliability of PCFs by confidence maps. An interesting observation of this work is that the learned confidence map converges to geometrically coherent and semantically consistent regions, which facilitates robust coordinate representation. By delivering the confident coordinates to keypoint/feature descriptors, we show that PCF-Net can be used as a plug-in to existing correspondence-dependent approaches. Extensive experiments on both indoor and outdoor datasets suggest that accurate geometric invariant coordinates help to achieve the state of the art in several correspondence problems, such as sparse feature matching, dense image registration, camera pose estimation, and consistency filtering. Further, the interpretable confidence map predicted by PCF-Net can also be leveraged to other novel applications from texture transfer to multi-homography classification.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligenc

arXiv.org e-Print Archive

Adiponectin improves coronary no-reflow injury by protecting the endothelium in rats with type 2 diabetes mellitus.

Author: Han Xue
Liu Huirong
Liu Xin
Lv Tingting
Ma Lu
Ma Xin-Liang
Sun Qi
Wang Ke
Wang Wen
Wu Ye
Xu Wenli
Zhang Suli
Publication venue: Jefferson Digital Commons
Publication date: 01/01/2017
Field of study

To determine the effect of adiponectin (APN) on the coronary no-reflow (NR) injury in rats with Type 2 diabetes mellitus (T2DM), 80 male Sprague-Dawley rats were fed with a high-sugar-high-fat diet to build a T2DM model. Rats received vehicle or APN in the last week and then were subjected to myocardial ischemia reperfusion (MI/R) injury. Endothelium-dependent vasorelaxation of the thoracic aorta was significantly decreased and serum levels of endothelin-1 (ET-1), intercellular cell adhesion molecule-1 (ICAM-1) and vascular cell adhesion molecule-1 (VCAM-1) were noticably increased in T2DM rats compared with rats without T2DM. Serum APN was positively correlated with the endothelium-dependent vasorelaxation, but negatively correlated with the serum level of ET-1. Treatment with APN improved T2DM-induced endothelium-dependent vasorelaxation, recovered cardiac function, and decreased both NR size and the levels of ET-1, ICAM-1 and VCAM-1. Hypoadiponectinemia was associated with the aggravation of coronary NR in T2DM rats. APN could alleviate coronary NR injury in T2DM rats by protecting the endothelium and improving microcirculation

Jefferson Digital Commons

Combination Therapy With Fingolimod and Neural Stem Cells Promotes Functional Myelination

Author: Ciric Bogoljub
Li Xing
Lu Xin-Yu
Ma Cun-Gen
Rostami A. M.
Ye Ze-Qin
Zhang Guang-Xian
Zhang Yuan
Publication venue: Jefferson Digital Commons
Publication date: 05/02/2019
Field of study

Myelination, which occurs predominantly postnatally and continues throughout life, is important for proper neurologic function of the mammalian central nervous system (CNS). We have previously demonstrated that the combination therapy of fingolimod (FTY720) and transplanted neural stem cells (NSCs) had a significantly enhanced therapeutic effect on the chronic stage of experimental autoimmune encephalomyelitis, an animal model of CNS autoimmunity, compared to using either one of them alone. However, reduced disease severity may be secondary to the immunomodulatory effects of FTY720 and NSCs, while whether this therapy directly affects myelinogenesis remains unknown. To investigate this important question, we used three myelination models under minimal or non-inflammatory microenvironments. Our results showed that FTY720 drives NSCs to differentiate into oligodendrocytes and promotes myelination in an ex vivo brain slice culture model, and in the developing CNS of healthy postnatal mice in vivo. Elevated levels of neurotrophic factors, e.g., brain-derived neurotrophic factor and glial cell line-derived neurotrophic factor, were observed in the CNS of the treated infant mice. Further, FTY720 and NSCs efficiently prolonged the survival and improved sensorimotor function of shiverer mice. Together, these data demonstrate a direct effect of FTY720, beyond its known immunomodulatory capacity, in NSC differentiation and myelin development as a novel mechanism underlying its therapeutic effect in demyelinating diseases

Jefferson Digital Commons